Skip to content

Conversation

@sumeerbhola
Copy link
Collaborator

Instead, the ReplicaChange.prev field is updated to reflect the latest state reported by the leaseholder.

In addition to simplifyin the code, it fixes an existing issue where an undo could rollback to a state preceding the latest leaseholder state.

Informs #157049

Epic: CRDB-55052

Release note: None

@sumeerbhola sumeerbhola requested review from tbg and wenyihu6 November 18, 2025 23:17
@sumeerbhola sumeerbhola requested review from a team as code owners November 18, 2025 23:17
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, no major concerns! Thanks for making this simplification, it's nice to see this streamlined further. Basically an LGTM, but I'm a bit fried today so @wenyihu6 should take a look. I also see that you had a TODO that you probably wanted to answer to yourself first.

@tbg reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @wenyihu6)


pkg/kv/kvserver/allocator/mmaprototype/testdata/cluster_state/rebalance_replica line 93 at r1 (raw file):

store-id=1
  range-id=1 load=[80,80,80] raft-cpu=20 config=(num_replicas=3 constraints={'+region=us-west-1:1'} voter_constraints={'+region=us-west-1:1'})
    store-id=1 replica-id=1 type=VOTER_FULL leaseholder=true

oops.


pkg/kv/kvserver/allocator/mmaprototype/testdata/cluster_state/rebalance_replica_local_stores_enacted_gc line 255 at r1 (raw file):



# TODO: why local-store-id=1?

What is the question here?


pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 280 at r1 (raw file):

	// NB: the pointers in change.pendingReplicaChanges are not the same as the
	// pointers in the internal clusterState. We look the latter up using the
	// changeID. The changes slice contains the internal pointers.

I was getting a little confused by this comment and I'm not sure what it's trying to explain to me. clusterState.pendingChanges is a keyed map. var changes is a slice. What is the confusion that could arise without the comment?


pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 223 at r1 (raw file):

	//
	// The prev field is mutable after creation, to ensure that an undo restores
	// the state to the latest source of truth from the leaseholder.

❤️


pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 471 at r1 (raw file):

// *pendingReplicaChange object, which is mutable. To prevent race conditions
// with exported functions on PendingRangeChange called from outside the
// package, the *pendingReplicaChange objects returned outside the package are

This seems a little iffy. Have you thought about storing the pendingReplicaChange in just one place and having the other locations just store the ChangeID? That way there's only ever one master copy to update, and we can do it "by value" without worrying about this memory held anywhere else.


pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 1602 at r1 (raw file):

				}
				// We did not undo the load change above, or remove it from the various
				// pendingChanges data-structures. We do those things now.

Is this the only place in which this is done? There's enough going on here to not want to have this duplicated.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg and @wenyihu6)


pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 280 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I was getting a little confused by this comment and I'm not sure what it's trying to explain to me. clusterState.pendingChanges is a keyed map. var changes is a slice. What is the confusion that could arise without the comment?

The comment is referencing the method parameter change.pendingReplicaChanges and just trying to ensure that the reader realizes that the pointers we successfully lookup in the clusterState.pendingChanges are not expected to be identical, in case someone later things that a pointer equality assertion is a good idea.

I've reworded this comment and moved it down:

// NB: the ch and c pointers are not identical even though they have the
// same changeID. We create two copies in
// clusterState.addPendingRangeChange, since the internal copy is mutable.

pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 471 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This seems a little iffy. Have you thought about storing the pendingReplicaChange in just one place and having the other locations just store the ChangeID? That way there's only ever one master copy to update, and we can do it "by value" without worrying about this memory held anywhere else.

I considered it, but the PendingRangeChange we return to the outside world is used for actually retrieving the explicit changes for the caller to perform (and logging of course). For example, PendingRangeChange.ReplicationChanges() called by MMAStoreRebalancer. We cannot rely on the forgetful storage in MMA to retrieve those changes (due to some random slowness, those could be GC'd), and it is really better for it to be self contained.

What one could do is return a different type than PendingRangeChange -- but at that point we have created two types that are similarish (since the external change needs the changeID and some subset of ReplicaChange), and seemed like software over-engineering.


pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 1602 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Is this the only place in which this is done? There's enough going on here to not want to have this duplicated.

Yes, this is the only place. The other places are decisive about undoing things (GC or AdjustPendingChangeDisposition). This is the one path where we have latest source of truth and first apply it to the membership state and then make a decision on what to do with the pending changes. Which is why the preceding if-block also has the so we pass !applyLoadChange comment.


pkg/kv/kvserver/allocator/mmaprototype/testdata/cluster_state/rebalance_replica_local_stores_enacted_gc line 255 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

What is the question here?

oops, I forgot to investigate. It was to figure out why we have top-k-ranges for local-store-id=1. It was just how the test was setup. I've tweaked the test to show both the stale and fresh view of top-k, with some commentary.

Copy link
Contributor

@wenyihu6 wenyihu6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up I didn’t get to this today and will take a pass tomorrow. The main thing I wanted to look into is when prev becomes mutable after creation and may no longer be compatible with next as a pair, will that break any of the helper-function (like

for _, c := range prc.pendingReplicaChanges {
or
if c.prev.IsLeaseholder && !c.next.IsLeaseholder {
) checks which might be assuming a fixed pairing. Lmk if this has already been looked at tho.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing I wanted to look into is when prev becomes mutable after creation and may no longer be compatible with next as a pair, will that break any of the helper-function

That is a good point. I'll make an attempt to clean this up more and ping when it is ready.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

@wenyihu6 wenyihu6 self-requested a review November 20, 2025 14:29
Instead, the ReplicaChange.prev field is updated to reflect the latest
state reported by the leaseholder.

In addition to simplifyin the code, it fixes an existing issue where an
undo could rollback to a state preceding the latest leaseholder state.

Informs cockroachdb#157049

Epic: CRDB-55052

Release note: None
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg and @wenyihu6)


pkg/kv/kvserver/allocator/mmaprototype/testdata/cluster_state/rebalance_replica_local_stores_fail_lease_transfer line 73 at r3 (raw file):

# The replica from store s2 is removed because of GC. This is not correct.
#
# TODO: fix after cleanup PR is merged and this rebases on top.

I've added this test which indicates a problem.
This is related to the hazard raised by @wenyihu6

Copy link
Contributor

@wenyihu6 wenyihu6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like me to hold off on the review, or should I continue reviewing with the understanding that this will be addressed in a follow-up PR?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@github-actions github-actions bot added the o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. label Nov 20, 2025
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like me to hold off on the review, or should I continue reviewing with the understanding that this will be addressed in a follow-up PR?

I've just created #158153 which will need to merge first, and this will be rebased on top of it.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg and @wenyihu6)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants